122 research outputs found
A data Grid prototype for distributed data production in CMS
The CMS experiment at CERN is setting up a Grid infrastructure required to fulfill the needs imposed by Terabyte scale productions for the next few years. The goal is to automate the production and at the same time allow the users to interact with the system, if required, to make decisions which would optimize performance. We present the architecture, design and functionality of our first working Objectivity file replication prototype. The middle-ware of choice is the Globus toolkit that provides promising functionality. Our results prove the ability of the Globus toolkit to be used as an underlying technology for a world-wide Data Grid. The required data management functionality includes high speed file transfers, secure access to remote files, selection and synchronization of replicas and managing the meta information. The whole system is expected to be flexible enough to incorporate site specific policies. The data management granularity is the file rather than the object level. The first prototype is currently in use for the High Level Trigger (HLT) production (autumn 2000). Owing to these efforts, CMS is one of the pioneers to use the Data Grid functionality in a running production system. The project can be viewed as an evaluator of different strategies, a test for the capabilities of middle-ware tools and a provider of basic Grid functionalities
Data Grid tutorials with hands-on experience
Grid technologies are more and more used in scientific as well as in industrial environments but often documentation and the correct usage are either not sufficient or not too well understood. Comprehensive training with hands-on experience helps people first to understand the technology and second to use it in a correct and efficient way. We have organised and run several training sessions in different locations all over the world and provide our experience. The major factors of success are a solid base of theoretical lectures and, more dominantly, a facility that allows for practical Grid exercises during and possibly after tutorial sessions
Grid Data Management in Action: Experience in Running and Supporting Data Management Services in the EU DataGrid Project
In the first phase of the EU DataGrid (EDG) project, a Data Management System
has been implemented and provided for deployment. The components of the current
EDG Testbed are: a prototype of a Replica Manager Service built around the
basic services provided by Globus, a centralised Replica Catalogue to store
information about physical locations of files, and the Grid Data Mirroring
Package (GDMP) that is widely used in various HEP collaborations in Europe and
the US for data mirroring. During this year these services have been refined
and made more robust so that they are fit to be used in a pre-production
environment. Application users have been using this first release of the Data
Management Services for more than a year. In the paper we present the
components and their interaction, our implementation and experience as well as
the feedback received from our user communities. We have resolved not only
issues regarding integration with other EDG service components but also many of
the interoperability issues with components of our partner projects in Europe
and the U.S. The paper concludes with the basic lessons learned during this
operation. These conclusions provide the motivation for the architecture of the
next generation of Data Management Services that will be deployed in EDG during
2003.Comment: Talk from the 2003 Computing in High Energy and Nuclear Physics
(CHEP03), La Jolla, Ca, USA, March 2003, 9 pages, LaTeX, PSN: TUAT007 all
figures are in the directory "figures
gcodeml: A Grid-enabled Tool for Detecting Positive Selection in Biological Evolution
One of the important questions in biological evolution is to know if certain
changes along protein coding genes have contributed to the adaptation of
species. This problem is known to be biologically complex and computationally
very expensive. It, therefore, requires efficient Grid or cluster solutions to
overcome the computational challenge. We have developed a Grid-enabled tool
(gcodeml) that relies on the PAML (codeml) package to help analyse large
phylogenetic datasets on both Grids and computational clusters. Although we
report on results for gcodeml, our approach is applicable and customisable to
related problems in biology or other scientific domains.Comment: 10 pages, 4 figures. To appear in the HealthGrid 2012 con
Towards a Swiss National Research Infrastructure
In this position paper we describe the current status and plans for a Swiss
National Research Infrastructure. Swiss academic and research institutions are
very autonomous. While being loosely coupled, they do not rely on any
centralized management entities. Therefore, a coordinated national research
infrastructure can only be established by federating the various resources
available locally at the individual institutions. The Swiss Multi-Science
Computing Grid and the Swiss Academic Compute Cloud projects serve already a
large number of diverse user communities. These projects also allow us to test
the operational setup of such a heterogeneous federated infrastructure
Optimization strategies for fast detection of positive selection on phylogenetic trees
Motivation: The detection of positive selection is widely used to study gene and genome evolution, but its application remains limited by the high computational cost of existing implementations. We present a series of computational optimizations for more efficient estimation of the likelihood function on large-scale phylogenetic problems. We illustrate our approach using the branch-site model of codon evolution. Results: We introduce novel optimization techniques that substantially outperform both CodeML from the PAML package and our previously optimized sequential version SlimCodeML. These techniques can also be applied to other likelihood-based phylogeny software. Our implementation scales well for large numbers of codons and/or species. It can therefore analyse substantially larger datasets than CodeML. We evaluated FastCodeML on different platforms and measured average sequential speedups of FastCodeML (single-threaded) versus CodeML of up to 5.8, average speedups of FastCodeML (multi-threaded) versus CodeML on a single node (shared memory) of up to 36.9 for 12 CPU cores, and average speedups of the distributed FastCodeML versus CodeML of up to 170.9 on eight nodes (96 CPU cores in total). Availability and implementation: ftp://ftp.vital-it.ch/tools/FastCodeML/. Contact: [email protected] or [email protected]
- …